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Abstract 

This  paper  proposes  a  reconfigurable  pipelined  multi¬ 
plier  architecture  that  achieves  high  performance  and  very 
low  energy  dissipation  by  adapting  its  structure  to  computa¬ 
tional  requirements  over  time .  In  this  reconfigurable  multi¬ 
plier,  energy  is  saved  by  disabling  and  bypassing  an  appro¬ 
priate  number  of  pipeline  stages  whenever  input  data  rates 
are  low.  To  evaluate  the  efficiency  of  our  multiplier  archi¬ 
tecture,  we  have  designed  a  multiplier-based  inverse  quan¬ 
tizer  (IQ)  for  MPEG-2  MP@ML.  Pipelines  are  dynamically 
reconfigured  according  to  the  size  of  the  picture  and  the 
number  of  nonzero  quantized  DCT  coefficients  per  block.  In 
comparison  with  corresponding  multiplier  implementations 
that  use  conventional  pipelines ,  our  reconfigurable  multi¬ 
pliers  dissipate  about  31-58%  less  energy.  Relative  energy 
savings  increase  with  decreasing  data  rates ,  since  our  re¬ 
configurable  structures  stay  in  a  low  energy  configuration 
for  proportionately  longer  time. 


1  Introduction 

Multimedia  wireless  communications  have  resulted  in 
a  growing  demand  for  energy-efficient  video  processing. 
Next  generation  portable  devices  must  provide  support  for 
iow-energy  encoding/decoding  and  transmission  of  multi- 
media  information.  Several  video  standards  are  currently  in 
use,  including  MPEG- 1  and  MPEG-2  for  multimedia  appli¬ 
cations,  and  H.261  for  video-phone  and  video-conferencing 
applications.  Implementations  of  these  standards  for  mobile 
system-on-chip  devices  should  provide  substantial  comput¬ 
ing  capabilities  at  low  energy  consumption  levels  [1,8,  9]. 

Multiplication  is  a  key  arithmetic  operation  in  video  pro¬ 
cessing.  The  development  of  multipliers  with  short  critical 
paths  and  low  power  consumption  has  become  the  topic  of 
extensive  recent  investigation  [2,  5,  6,  12].  Pipelining  is 


a  popular  technique  for  realizing  high-performance,  high- 
efficiency  CMOS  multipliers  by  reducing  the  supply  voltage 
at  the  lowest  possible  level  while  still  satisfying  through¬ 
put  constraints.  In  deep  pipelines,  however,  registers  are 
responsible  for  an  increasingly  large  fraction  of  total  dissi¬ 
pation.  no  matter  how  efficiently  they  may  have  been  imple¬ 
mented  [4,  7.  10,  13].  Even  if  clock  gating  is  used  to  only 
store  essential  data,  pipeline  registers  may  still  be  latching 
their  inputs  unnecessarily  if  throughput  requirements  are 
lower  than  the  maximum  specified. 

This  paper  presents  a  reconfigurable  pipelined  multiplier 
that  adapts  its  performance  and  dissipation  to  its  computa¬ 
tional  load  over  time.  Our  multiplier  is  capable  of  adapt¬ 
ing  its  structure  within  one  clock  cycle,  if  required.  It  can 
thus  efficiently  cope  with  variable  data-rate  multimedia  ap¬ 
plications  such  as  video  processing.  Energy  dissipation  is 
reduced  by  disabling  and  bypassing  a  select  subset  of  reg¬ 
isters  based  on  the  specified  throughput  requirements  and 
the  anticipated  computational  load.  This  information  is 
application-dependent  and  can  be  inferred  at  high  abstrac¬ 
tion  levels.  In  the  context  of  inverse  quantization  for  video 
processing,  for  example,  our  multiplier  can  be  adapted  by 
simply  counting  the  number  of  nonzero  coefficients  in  each 
encoded  block.  Our  reconfiguration  approach  can  be  ap¬ 
plied  to  general  linear  pipelines.  It  also  can  be  combined 
with  voltage  scaling  to  further  increase  energy  efficiency. 

To  evaluate  the  efficiency  of  our  reconfigurable  multi¬ 
plier  architecture,  we  designed  a  multiplier-based  inverse 
quantizer  (IQ)  for  MPEG-2  MP@ML.  Pipelines  were  dy¬ 
namically  reconfigured  according  to  the  picture  size  and 
the  number  of  nonzero  quantized  discrete  cosine  transform 
(DCT)  coefficients  per  block.  In  simulations  with  a  0.35^m 
standard-cell  CMOS  technology  and  MPEG-2  MP@ML 
bitstreams,  our  reconfigurable  IQ  was  up  to  58%  less  dissi¬ 
pative  than  its  non-reconfigurable,  statically  pipelined  coun¬ 
terpart.  Moreover,  relative  reductions  increased  as  data  rates 
decreased,  since  the  reconfigurable  multiplier  stayed  in  a 
low  energy  mode  for  proportionally  longer  time. 
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The  remainder  of  this  paper  has  six  sections.  In  Sec¬ 
tion  2  we  briefly  review  the  MPEG  video  processing  stan¬ 
dard.  Section  3  gives  background  on  multiplier-based  IQ 
design  and  motivates  the  use  of  reconfigurable  multipliers 
for  improving  the  energy  efficiency  of  IQ's.  Section  4  de¬ 
scribes  our  design  methodology  for  high-performance,  low- 
energy  reconfigurable  multipliers.  Section  5  presents  our 
comparative  evaluation  of  IQ's  for  a  variety  of  bitstreams 
and  bit  rates.  Our  contributions  and  ongoing  research  are 
summarized  in  Section  6. 

2  MPEG  Video  Processing 

In  the  MPEG  video  standard,  a  coded  frame  consists 
of  a  frame  picture  or  a  pair  of  field  pictures.  There  are 
three  types  of  pictures:  intra  (I),  predictive  (P),  and  bidi¬ 
rectional  (B).  Each  picture  is  divided  into  non-overlapping 
macroblocks  with  16x16  pixels.  Each  macroblock  (MB) 
consists  of  four  luminance  blocks  (Y)  and  two  chrominance 
blocks  (Cb  and  Cr),  each  8x8  pixels.  Since  I-pictures  are 
coded  without  reference  to  neighboring  pictures  in  the  se¬ 
quence.  their  coding  exploits  only  the  correlations  within 
the  picture.  P-pictures  and  B-pictures  are  coded  as  differ¬ 
ences  between  the  picture  being  coded  and  a  reference  pic¬ 
ture.  If  there  is  motion  in  the  sequence,  a  better  prediction 
can  be  obtained  from  pixels  in  the  reference  picture  that  are 
shifted  relative  to  the  current  picture  pixels. 


Figure  1.  Block  diagram  of  basic  MPEG  video 
encoder  and  decoder. 


The  block  diagram  of  the  basic  MPEG  video  encoder 
and  decoder  structure  is  shown  in  Figure  1.  In  general, 
the  encoder  comprises  a  DCT/IDCT,  a  motion  estima¬ 
tor/compensator,  a  Q/IQ,  and  a  variable  length  coder  (VLC). 
The  decoder  performs  the  reverse  operations  of  the  encoder 
and  consists  of  a  variable-length  decoder  (VLD),  an  IQ,  an 
IDCT,  and  a  motion  compensator. 


Table  1.  Upper  bounds  for  picture  size,  frame 
rates,  and  bit  rates. 
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MPEG  video  standards  only  specify  the  video  bitstream 
syntax,  the  decoding  semantics,  and  the  required  maxi¬ 
mum  performance  for  decoding  a  bitstream  of  any  par¬ 
ticular  type.  Table  1  gives  the  upper  bounds  specified 
by  MPEG  for  picture  size,  frame  rate,  and  bit  rate  for 
various  combinations  of  profiles  and  levels.  For  exam¬ 
ple,  for  the  main-profile/main-level  combination  (MPEG-2 
MP@ML  for  short)  in  the  NTSC-compatible  mode,  the  pic¬ 
ture  size,  frame  rate,  and  bit  rate  are  720  x  480  pixels,  30 
frames/sec,  and  15  Mbit/sec,  respectively.  Thus,  the  max¬ 
imum  throughput  requirement  for  an  MPEG-2  MP@ML 
decoder  is  15.552  x  106  samples/sec  (30  frames/sec  x 
(720/16  x  480/16)  MB/frame  x  6  blocks/MB  x  64  pix- 
els/block).  Therefore,  each  building  block  in  Figure  1,  such 
as  the  IQ  for  example,  should  be  designed  to  meet  this  maxi¬ 
mum  throughput  requirement,  even  though  the  average  data 
rate  is  usually  much  less  than  the  upper  bound  specified  by 
the  standard.  Consequently,  the  decoder  may  be  perform¬ 
ing  no  useful  computation  for  a  large  fraction  of  the  total 
number  of  cycles. 

3  Inverse  Quantization 

In  this  section,  we  describe  the  inverse  quantization  pro¬ 
cedure  in  the  MPEG  video  standard,  following  largely  the 
description  given  in  [8].  We  subsequently  motivate  the  use 
of  reconfigurable  multipliers  for  reducing  its  energy  dissi¬ 
pation. 

The  DCT-based  coding/decoding  for  an  8x8  block  of 
pixels  is  common  to  all  picture  types  and  plays  a  central 
role  in  the  MPEG  video  standard.  The  DCT  has  certain 
properties  that  simplify  coding  models  and  make  coding 
coefficients  using  perceptual  quality  measures.  Basically, 
the  DCT  is  a  method  for  decomposing  a  block  of  pixels 
into  a  weighted  sum  of  spatial  frequencies.  If  only  the  low 
frequency  DCT  coefficients  are  nonzero,  the  pixels  in  the 
block  vary  slowly  with  position.  If  high  frequencies  are 
present,  the  block  intensity  changes  rapidly  from  pixel  to 
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pixel.  When  the  DCT  is  computed  for  a  block  of  pixels, 
it  is  desirable  to  represent  the  coefficients  for  high  spatial 
frequencies  with  less  precision.  This  is  done  by  a  process 
called  quantization.  A  DCT  coefficient  is  quantized  by  di¬ 
viding  it  by  a  nonzero  positive  integer  called  a  quantiza¬ 
tion  value  and  rounding  the  quotient-the  quantized  DCT 
coefficient-to  the  nearest  integer.  The  bigger  the  quantiza¬ 
tion  value  is,  the  lower  precision  coefficients  can  be  trans¬ 
mitted  to  a  decoder  with  fewer  bits.  The  use  of  large  quanti¬ 
zation  values  for  high  spatial  frequencies  allows  the  encoder 
to  selectively  discard  high  spatial  frequency  activity  that  the 
human  eye  cannot  readily  perceive.  The  quantizing  values 
are  chosen  so  as  to  minimize  perceived  distortion  in  the  the 
reconstructed  pictures,  using  principles  based  on  the  human 
visual  system. 

The  coding  of  quantized  DCT  coefficients  is  lossless, 
that  is,  the  decoder  is  able  to  reproduce  the  exact  same  DCT 
coefficients  computed  by  the  encoder,  before  2-D  IDCT  is 
processed.  This  process  is  essentially  a  multiplication  by 
the  quantizer  step  size.  For  each  block,  the  quantizer  step 
size  is  determined  by  the  product  of  a  weighting  matrix  and 
a  quantizer  scale  factor.  The  following  equation  specifies 
how  to  reconstruct  the  DCT  coefficients  from  the  quantized 
ones. 

F(v,  u)  =  (2  *  QF(v,  u)  +  k)  •  ir (u\  r,  u)  •  SJ 32  . 

where  F(u,  v)  is  an  8  x  8  matrix  of  reconstructed  DCT 
coefficients,  QF(u .  v)  is  the  corresponding  quantized  DCT 
coefficient  matrix,  \V(w.  v,  u)  is  a  weighting  matrix  (w  =  0 
for  intra  coded  blocks,  w  —  1  for  non-intra  coded  blocks), 
Ar  is  a  parameter  (k  —  0  for  intra  coded  blocks,  k  = 
Sign(QF(v ,  u))  for  non-intra  coded  blocks),  and  Sq  is  the 
quantizer  scale. 

DCT  blocks  of  MPEG  compressed  video  sequences  usu¬ 
ally  have  only  five  or  six  nonzero  coefficients,  mainly  lo¬ 
cated  in  the  low  spatial  frequency  position.  Given  such  in¬ 
put  data  statistics,  the  number  of  operations  per  block  can 
be  reduced,  since  multiplication  and  addition  with  a  zero¬ 
valued  DCT  coefficient  constitute  no  operation.  The  num¬ 
ber  of  operations  required  for  each  block  can  be  predicted 
precisely  and  effortlessly,  before  the  IQ  is  started,  by  ob¬ 
serving  the  operation  of  the  VLD.  To  achieve  the  maximum 
IQ  throughput  for  MPEG-2  MP@ML  (15.552  x  106  sam¬ 
ples/sec),  deeply  pipelined  multipliers  can  be  used.  When 
there  are  many  zero  coefficients,  however,  this  peak  perfor¬ 
mance  is  not  required  and  results  in  wasted  energy. 

Reconfigurable  pipelined  multipliers  can  save  energy  by 
adapting  to  the  throughput  needs  of  the  input  stream  over 
time.  The  number  of  nonzero  quantized  DCT  coefficients 
per  block  equals  the  number  of  operations  required  to  per¬ 
form  the  block’s  inverse  quantization.  As  the  number  of 
nonzero  quantized  DCT  coefficients  per  block  decreases, 
the  throughput  of  the  multiplier  also  decreases,  thus  saving 


energy.  Picture  size  can  be  used  to  achieve  additional  en¬ 
ergy  savings.  For  example,  if  the  picture  size  is  a  quarter  of 
the  maximum  picture  size  for  MPEG-2  MP@ML.  the  num¬ 
ber  of  blocks  that  must  be  processed  within  1/30  second  de¬ 
creases  by  a  factor  of  4.  Therefore,  the  multiplier  through¬ 
put  may  decrease  by  a  factor  of  4,  independent  of  the  num¬ 
ber  of  nonzero  quantized  DCT  coefficients  per  block. 


4  Reconfigurable  Multiplier  Design 


This  section  highlights  our  methodology  for  designing 
performance-driven  reconfigurable  multipliers.  In  our  pro¬ 
posed  reconfigurable  structure,  whenever  throughput  re¬ 
quirements  are  low,  register  stages  are  selectively  disabled 
by  gated  clocks  and  bypassed  by  multiplexors. 


Figure  2.  Reconfigurable  4-stage  pipeline. 


Figures  2  shows  our  4-stage  pipelined  reconfigurable 
structure.  The  throughput  of  a  conventional  pipelined  struc¬ 
ture  is  fixed  at  one  operation  per  cycle,  whereas  the  through¬ 
put  of  our  reconfigurable  pipelined  structure  may  be  set  to 
one  operation  every  one,  two.  or  four  cycles,  depending  on 
the  input  data  rates. 

Figure  3  shows  the  timing  diagram  of  a  non- 
reconfigurable  pipeline  that  processes  2  samples  over  8 
clock  cycles.  Figure  4  gives  the  timing  diagram  of  the  re¬ 
configurable  4-stage  pipeline,  configured  in  its  single-stage 
mode.  The  4-stage  pipeline  is  capable  of  handling  the  max¬ 
imum  throughput  requirement  of  8  samples  per  8  cycles. 
When  only  2  samples  need  to  be  processed  in  8  cycles,  how¬ 
ever,  the  conventional  4-stage  pipeline  remains  idle  for  6 
cycles.  The  reconfigurable  pipeline  of  Figure  2  uses  these 
idle  cycles  to  spread  the  computation  and  eliminate  three 
stages  of  registers.  Three  register  stages  are  disabled  by 
gated  clocks  and  bypassed  through  multiplexors,  thus  sav¬ 
ing  a  significant  fraction  of  the  datapath’s  total  dissipation 
in  the  reconfigured  datapath. 
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Figure  3.  Timing  diagram  of  non- 
reconfigurable  4-stage  pipeline  for  2  samples 
over  8  cycles. 
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Figure  4.  Timing  diagram  of  reconfigurable 
4-stage  pipeline  in  single-stage  mode  for  2 
samples  over  8  cycles. 


Array  multiplier  structures  are  popular  due  to  their  sim¬ 
ple  and  regular  interconnections.  Figure  5  shows  a  4-stage 
pipelined  4x4  array  multiplier  which  is  designed  using 
our  reconfiguration  technique.  This  multiplier  can  be  re¬ 
configured  as  a  1,2,  and  4-stage  pipeline  according  to  the 
throughput  desired.  We  designed,  synthesized  and  simu¬ 
lated  16  x  16  multipliers  with  different  numbers  of  pipeline 
stages  using  a  0.35  fim  CMOS  standard-cell  technology. 
The  registers  of  the  reconfigurable  multipliers  were  imple¬ 
mented  by  positive  edge-triggered  D  flip-flops  using  trans¬ 
mission  gates.  This  kind  of  flip-flops  is  commonplace  in 
standard  cell  design  [11].  To  obtain  power  estimates,  we 
used  the  switch-level  circuit  simulator  IRSIM  with  RC- 
parameters  extracted  using  EPOCH,  a  commercial  Verilog- 
HDL  synthesizer  and  standard  cell  router. 


x3  x2  x1  *0 


Figure  5.  4  x  4  array  multiplier  with  reconfig¬ 
urable  4-stage  pipeline. 


Figure  6  shows  the  relative  energy  savings  per  operation 
for  4-,  8-,  and  1 6-stage  reconfigurable  multipliers  over  non- 
reconfigurable  4-,  8-,  and  16-stage  pipelined  ones,  respec¬ 
tively.  For  example,  the  second  bar  from  the  left  gives  the 
relative  savings  of  an  8-stage  reconfigurable  multiplier  over 
an  8-stage  conventional  multiplier  when  the  reconfigurable 
multiplier  is  organized  as  a  single-stage  pipeline.  In  this 
case,  the  reconfigured  8-stage  multiplier  saves  more  than 
58%  of  the  dissipation  over  the  conventional  8-stage  multi¬ 
plier.  The  negative  savings  for  stages  4,  8,  and  16  are  due  to 
the  dissipation  of  the  reconfiguring  hardware.  These  num¬ 
bers  were  obtained  using  IRSIM  with  uniformly  distributed 
random  inputs.  The  overall  energy  savings  of  these  multi¬ 
pliers  depend  on  the  required  maximum  performance  and 
the  statistics  of  data  rates  over  time. 

5  Simulation  Results 

This  section  presents  a  comparative  evaluation  of  a  re¬ 
configurable  pipelined  IQ  with  a  conventional  one  on  a  va¬ 
riety  of  MPEG-2  MP@ML  bitstreams.  We  first  focus  on 
one  test  bitstream  at  a  fixed  bit  rate  and  provide  detailed 
evidence  about  the  effectiveness  of  our  coefficient-based 
reconfiguration  criterion.  We  then  give  simulation  results 
that  demonstrate  the  significant  relative  savings  that  can  be 
achieved  using  our  reconfigurable  pipelined  multiplier  for  a 
variety  of  bitstreams  and  bit  rates. 

The  potential  of  reconfiguration  to  reduce  energy  dissi¬ 
pation  can  be  best  understood  by  examining  the  statistics 
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Figure  6.  Relative  energy  reduction  per  op¬ 
eration  for  reconfigurable  4-,  8-,  and  16-stage 
multipliers  over  non-reconfigurable4-,  8-,  and 
16-stage  pipelined  ones,  respectively. 


of  flower  garden ,  one  of  the  bitstreams  we  experimented 
with.  Figure  7  gives  the  cumulative  distribution  of  the  num¬ 
ber  of  nonzero  quantized  DCT  coefficients  per  block  for 
the  IQ  of  the  MPEG  bitstream  flower  garden  at  an  aver¬ 
age  bit  rate  of  6.0  Mbit/sec.  This  video  is  one  of  the  test 
bitstreams  of  the  MPEG  video  committee  and  consists  of 
38  I-pictures,  1 13  P-pictures,  and  299  B-pictures.  The  res¬ 
olution  of  each  picture  is  704  x  480  pixels.  To  meet  the 
maximum  throughput  requirement  of  the  IQ  with  low  area 
penalty  and  low  supply  voltage,  we  used  a  reconfigurable  8- 
stage  pipelined  16  x  16  multiplier  operating  at  31.1 04MHz 
with  a  1.40V  supply.  Each  reconfiguration  mode  of  this 
multiplier  is  guaranteed  not  to  change  for  at  least  128  cy¬ 
cles.  Figure  7  shows  that  38%  of  the  nonzero  quantized 
DCT  coefficients  in  the  bitstream  are  found  in  blocks  with 
at  most  8  nonzero  coefficients.  To  process  each  element  of 
these  blocks,  it  suffices  to  configure  the  8-stage  pipelined 
multiplier  as  a  single-stage  pipeline,  thus  eliminating  all  the 
intermediate  register  stages.  The  aggregate  throughput  of 
the  single-stage  multiplier  decreases  to  8  cycles  per  sam¬ 
ple.  For  these  blocks,  the  relative  energy  savings  per  op¬ 
eration  for  the  8-stage  multiplier  configured  as  single-stage 
pipeline  over  the  non-reconfigurable  8-stage  one  are  58%  as 
shown  in  Figure  6.  Therefore,  when  flower  garden  is  pro¬ 
cessed,  the  total  relative  savings  are  approximately  22.0% 
(=  0.38  x  58%).  Similarly,  23%  of  the  nonzero  coefficients 
in  flower  garden  can  be  found  in  blocks  with  9-16  nonzero 
coefficients  that  can  be  processed  by  a  two-stage  configu¬ 
ration  of  the  8-stage  multiplier.  31%  of  the  nonzero  coeffi¬ 
cients  in  flower  garden  can  be  found  in  blocks  with  17-32 
nonzero  coefficients  that  can  be  processed  by  a  4-stage  con- 
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Number  of  Nonzero  Coefficients  /  Block 

Figure  7.  Cumulative  distribution  for  the  num¬ 
ber  of  nonzero  coefficients  per  block  in  flower 
garden  at  6  Mbit/sec. 


figuration  of  the  8-stage  multiplier.  Finally,  about  0.8%  of 
the  nonzero  coefficients  are  elements  of  blocks  with  more 
than  32  nonzero  coefficients.  In  this  case,  the  reconfigured 
8-stage  16x16  multipliers  should  be  used  for  maximum 
throughput.  By  adding  the  energy  savings  for  each  recon¬ 
figuration  mode,  we  conclude  that  42.84%  are  the  total  rela¬ 
tive  energy  savings  with  a  reconfigurable  8-stage  multiplier 
for  flower  garden. 


Figure  8.  Cumulative  distribution  for  the  num¬ 
ber  of  nonzero  coefficients  per  block  in  flower 
garden  at  various  bitrates. 

Figure  8  gives  statistics  for  the  number  of  nonzero  quan¬ 
tized  DCT  coefficients  per  block  for  different  bit  rates  of 
the  same  source  image  flower  garden.  The  percentage 
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of  nonzero  quantized  DCT  coefficients  in  blocks  with  a 
small  number  of  nonzero  DCT  coefficients  increases  as  bit 
rates  decrease.  For  lower  bit  rates,  therefore,  the  reconfig- 
urable  multipliers  spend  longer  time  in  a  “shallow**,  energy- 
efficient  mode,  and  thus  greater  relative  savings  can  be 
achieved.  The  picture  size  of  flower  garden  at  1.5  Mbit/sec 
is  352  x  240  pixels,  which  is  1/4  of  the  maximum  picture 
size,  720  x  480  pixels,  in  MPEG2  MP@ML. 


Bit  Rate  (Mbit/s) 

Figure  9.  Relative  energy  savings  with  a  re- 
configurable  8-stage  multiplier-based  IQ  for 

susi9  table  tennis ,  mobile ,  and  flower  garden  at  var¬ 
ious  bit  rates. 


Figures  9  shows  the  relative  energy  savings  that  can 
be  achieved  using  the  IQ  with  reconfigurable  multipliers 
over  their  non-reconfigurable  counterparts  for  the  MPEG- 
2  MP@ML  bitstreams  susi,  table  tennis .  mobile ,  and  flower 
garden.  Results  are  given  for  five  different  bit  rates:  1.5, 
4.0,  6.0,  8.0,  and  12.0  Mbit/sec  for  each  bitstream.  Our 
results  show  that  relative  savings  increase  as  video  bitrates 
decrease  and  can  exceed  58%. 

6  Conclusion 

In  this  paper,  we  have  presented  a  novel  methodology 
for  designing  reconfigurable  multipliers  for  multimedia  sys¬ 
tems  that  adapt  to  variations  in  the  input  data  rate.  Our  ap¬ 
proach  has  been  applied  to  the  design  of  an  inverse  quan¬ 
tizer  for  MPEG-2  MP@ML  and  has  resulted  in  energy  re¬ 
ductions  of  up  to  58%.  Comparable  energy  savings  have 
been  observed  in  preliminary  experiments  with  the  inverse 
discrete  cosine  transform  [3].  Our  approach  is  also  applica¬ 
ble  to  other  signal  processing  computations  and  can  be  used 
to  design  energy-efficient  pipelined  arithmetic  circuitry  for 
applications  with  dynamically  varying  throughput  rates. 
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